Distributed Kernel Principal Component Analysis

نویسندگان

  • Maria-Florina Balcan
  • Yingyu Liang
  • Le Song
  • David P. Woodruff
  • Bo Xie
چکیده

Kernel Principal Component Analysis (KPCA) is a key technique in machine learning for extracting the nonlinear structure of data and pre-processing it for downstream learning algorithms. We study the distributed setting in which there are multiple workers, each holding a set of points, who wish to compute the principal components of the union of their pointsets. Our main result is a communication efficient algorithm that takes as input arbitrary data points and computes a set of global principal components, that give relative-error approximation for polynomial kernels, or give relative-error approximation with an arbitrarily small additive error for a wide family of kernels including Gaussian kernels. While recent work shows how to do PCA in a distributed setting, the kernel setting is significantly more challenging. Although the “kernel trick” is useful for efficient computation, it is unclear how to use it to reduce communication. The main problem with previous work is that it achieves communication proportional to the dimension of the data points, which would be proportional to the dimension of the feature space, or to the number of examples, both of which could be very large. We instead first select a small subset of points whose span contains a good approximation (the column subset selection problem, CSS), and then use sketching for low rank approximation to achieve our result. The column subset selection is done using a careful combination of oblivious subspace embeddings for kernels, oblivious leverage score approximation, and adaptive sampling. These lead to nearly optimal communication bound for CSS, and also lead to nearly optimal communication for KPCA in the constant approximation region. Experiments on large scale real world datasets show the efficacy of our algorithm.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Object Recognition based on Local Steering Kernel and SVM

The proposed method is to recognize objects based on application of Local Steering Kernels (LSK) as Descriptors to the image patches. In order to represent the local properties of the images, patch is to be extracted where the variations occur in an image. To find the interest point, Wavelet based Salient Point detector is used. Local Steering Kernel is then applied to the resultant pixels, in ...

متن کامل

Outlier Detection in Wireless Sensor Networks Using Distributed Principal Component Analysis

Detecting anomalies is an important challenge for intrusion detection and fault diagnosis in wireless sensor networks (WSNs). To address the problem of outlier detection in wireless sensor networks, in this paper we present a PCA-based centralized approach and a DPCA-based distributed energy-efficient approach for detecting outliers in sensed data in a WSN. The outliers in sensed data can be ca...

متن کامل

Optimal Convergence for Distributed Learning with Stochastic Gradient Methods and Spectral-Regularization Algorithms

We study generalization properties of distributed algorithms in the setting of nonparametric regression over a reproducing kernel Hilbert space (RKHS). We first investigate distributed stochastic gradient methods (SGM), with mini-batches and multi-passes over the data. We show that optimal generalization error bounds can be retained for distributed SGM provided that the partition level is not t...

متن کامل

Kernel Discriminant Analysis Based on Canonical Differences for Face Recognition in Image Sets

A novel kernel discriminant transformation (KDT) algorithm based on the concept of canonical differences is presented for automatic face recognition applications. For each individual, the face recognition system compiles a multi-view facial image set comprising images with different facial expressions, poses and illumination conditions. Since the multi-view facial images are non-linearly distri...

متن کامل

Probabilistic Analysis of Kernel Principal Components

This paper presents a probabilistic analysis of kernel principal components by unifying the theory of probabilistic principal component analysis and kernel principal component analysis. It is shown that, while the kernel component enhances the nonlinear modeling power, the probabilistic structure offers (i) a mixture model for nonlinear data structure containing nonlinear sub-structures, and (i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1503.06858  شماره 

صفحات  -

تاریخ انتشار 2015